Skip to content

Conversation

@nrseman
Copy link

@nrseman nrseman commented Oct 7, 2025

This PR introduces mixed-precision building blocks for Krylov subspace methods and a highly optimized mixed-precision implementation of ILU0 + BiCGSTAB. Initial testing indicate 1.6x speed-up per linear iteration with minimal impact on iteration count. For information on how to activate the new solver see the README.md file in the opm/simulators/linalg/mixed folder.

Looking forward to your feedback.

PS! After rebasing on master yesterday, the SPE10 case I used as one of my performance tests failed endpoint validation due to negative SOGCR. I have not investigated further, but it seems to be a finite precision issue. You can see my hack to work around the issue in the mixed-dev branch on my fork

@blattms blattms added the manual:enhancement This is an enhancement/improvent that needs to be documented in the manual label Oct 7, 2025
@blattms
Copy link
Member

blattms commented Oct 7, 2025

jenkins build this please

@akva2
Copy link
Member

akva2 commented Oct 8, 2025

i see malloc(), yet no free(). i see new[] yet no delete[].

mike drop.

@multitalentloes
Copy link
Member

1.6 speedup is very promising @nrseman, great work!
I am quite interested in this topic, so I'll ask for a few questions if you don't mind (I have not yet looked at the exact implementation).

  1. what is being done in full precision, and what is being done with single (or even lower) precision (linear solver, preconditioner, specific operations, parts of matrices etc)?
  2. And was that combination of precisions attained experimentally by trial and error or has there been specific tools used to uncover calculations that are not very sensitive to the precision being used, or other types of arguments made to identify the particular mixed-precision schemes you have identified?

@nrseman
Copy link
Author

nrseman commented Oct 8, 2025

i see malloc(), yet no free(). i see new[] yet no delete[].

mike drop.

I have renamed all *_new() methods to *_alloc() and added corresponding *_free() methods which are called from the destructor. @akva2, I don't think I'm using new[] anywhere. If you still see an issue please let me know.

@nrseman
Copy link
Author

nrseman commented Oct 8, 2025

Thank you for your interest and your questions, @multitalentloes. I don't want to start a detailed technical discussion in this PR request. I'll contact you separately.

Copy link
Member

@multitalentloes multitalentloes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution to OPM, I have some surface-level feedback on the code right now, and also taken the opportunity to note down some of the goals for this code further down that line.

Immediate code-fixes:

  • Compilation error when compiling with CUDA/HIP (see comment)
  • Compilation error type mismatch during variable assignment (see comment)
  • Ensure that this PR does not break OPM compilation for ARM CPUs (as they do not support avx2)

General stuff:

  • Use docstrings, see examples from other files in the codebase
  • Use clang-format to make it more consistent with the rest of the codebase
  • Use fmt::print instead of printf

Future work:

  • Registering the preconditioner in the factory (currently trivial placeholder). This will make it available to other multi-stage preconditioners in OPM, for instance letting us combine it with the default CPRW preconditioner, which is would have a great impact if performance is improved! From what we discussed outside of this PR this preconditioner is implemented with knowledge of how the BiCGSTAB is performed and that is something that should be looked more into.
  • Having these preconditioners registered in the MPI factory as well with wrapBlockPreconditioner so it can be used in parallel simulations

DUNE_UNUSED_PARAMETER(prm);
return std::make_shared<MultithreadDILU<M, V, V>>(op.getmat());
});
F::addCreator("mixed-ilu0", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line produces compilation warnings due to unused variables

Copy link
Author

@nrseman nrseman Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by adding

DUNE_UNUSED_PARAMETER(op);

and will be included in my next update.

DUNE_UNUSED_PARAMETER(prm);
return std::make_shared<TrivialPreconditioner<V,V>>();
});
F::addCreator("mixed-dilu", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused variable warnings here as well

Copy link
Author

@nrseman nrseman Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by adding

DUNE_UNUSED_PARAMETER(op);

and will be included in my next update.

virtual void update() override {};
virtual bool hasPerfectUpdate() const override {return true;}
virtual void pre (X& x, Y& y) override {};
virtual void post (X& x) override {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused variable warnings for all of these functions with arguments as well with the empty body. I think this has been resolved other places in the code with [[maybe_unused]] or a similar attribute

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by prepending all arguments by [[maybe_unused]] and will be part of my next update.

printf("n = %lu\n",x.dim());
}

void exportSparsity(const char *path='.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

causes compilation error '.' is a char, replace with "." to make it a string/char*

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I don't get a compliation error, but your observation is correct and I have made the fix.



/*
void mat3_view(double const *M, char const *name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove dead code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


}

int bslv_pbicgstab3m(bslv_memory *mem, bsr_matrix *A, const double *b, double *x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general for new functions: Use docstring to document what the function does and what its expected inputs, outputs, and special assumptions are to make the implementation more accessible

Copy link
Author

@nrseman nrseman Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@multitalentloes: I'm a bit confused about what you mean by docstring. Do you simply mean a multiiline comment or is it required to adhere to the doxygen standard? I see a mix in the current code. Also, is there a policy on whether the documentation is attached to the function definition of the function declaration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it should adhere to the doxygen standard for those who use that. I would guess declaration is best, at least that is what I have done when declaration is in a separate headerfile





Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use clang-format on all the files to ensure that things are compatible with existing code, easiest to just to it now from the start to avoid for instance weird whitespacing such as these blank lines

{
public:

MixedSolver(M A, double tol, int maxiter, bool use_dilu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit dubious to taking a copy of A here, it seems to work for some inexplicable reasons, but why not take a const ref (const M& A)? I think as it stands, the code is storing a pointer (data_) to a memory location that will be deleted after invocation. No idea why this works in practice here, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this works implies that a copy of the matrix object A is a shallow copy. Nevertheless, your solution is better and the change will be included in my next update.

@nrseman
Copy link
Author

nrseman commented Nov 19, 2025

I have addressed all trivial issues and pushed the updates to the PR branch. What remains is the following:

  • Compile on ARM: Based on conversations with @blattms, the solution will be to perform a compile time check for avx2 support and conditionally enable the mixed solvers. This implies that the functionality will not be able on ARM for now.
  • Fix GPU compilation errors: I'd appreciate any help on this as I am currently not set up to compile for GPUs. @kjetilly, @multitalentloes is this something either of you can help me with?
  • Add function descriptions: Doc strings it is ...
  • clang-format: I'm saving this to the end.

@nrseman
Copy link
Author

nrseman commented Nov 19, 2025

@blattms: Would the file below work for the compilation test we discussed? I have verified that the code compiles with the following command

gcc -mavx2 -c test_avx2.c

test_avx2.c

#include <immintrin.h>

typedef
struct bsr_matrix
{
    int nrows;
    int ncols;
    int nnz;
    int b;

    int *rowptr;
    int *colidx;
    double *dbl;
    float *flt;

} bsr_matrix;

void bsr_vmspmv3(bsr_matrix *A, const double *x, double *y)
{
    int nrows = A->nrows;
    int *rowptr=A->rowptr;
    int *colidx=A->colidx;
    const float *data=A->flt;

    const int b=3;

    __m256d mm_zeros =_mm256_setzero_pd();
    for(int i=0;i<nrows;i++)
    {
        __m256d vA[3];
        for(int k=0;k<3;k++) vA[k] = mm_zeros;
        for(int k=rowptr[i];k<rowptr[i+1];k++)
        {
            const float *AA=data+9*k;

            int j = colidx[k];
            __m256d vx = _mm256_loadu_pd(x+b*j);

            vA[0] += _mm256_cvtps_pd(_mm_loadu_ps(AA+0))*_mm256_permute4x64_pd(vx,0b00000000); //0b01010101
            vA[1] += _mm256_cvtps_pd(_mm_loadu_ps(AA+3))*_mm256_permute4x64_pd(vx,0b01010101); //0b01010101
            vA[2] += _mm256_cvtps_pd(_mm_loadu_ps(AA+6))*_mm256_permute4x64_pd(vx,0b10101010); //0b01010101
        }

        // sum over columns
        __m256d vy, vz;
        vz = vA[0] + vA[1] + vA[2];

        double *y_i = y+b*i;
        vy = _mm256_loadu_pd(y_i);       // optional blend to keep
        vz =_mm256_blend_pd(vy,vz,0x7);  // 4th element unchanged
        _mm256_storeu_pd(y_i,vz);
    }
}

@kjetilly
Copy link
Contributor

@nrseman I think the only thing needed for this to compile on with GPU support enabled is the change in FlexibleSolver_impl.hpp here: kjetilly@88cfe8a . Essentially protecting the call to your code in a if constexpr (!is_gpu_operator...), so that the compiler never tries to use your code with a GPU matrix.

@nrseman
Copy link
Author

nrseman commented Nov 24, 2025

@kjetilly: After updating my toolchain I ran into the posix_memalign visibility issue you have mentioned. I am a bit surprised because I was under the impression that all recent versions of glibc make it visible by default. Anyways, my fix is similar to yours and already included among the pushed updates.

@blattms
Copy link
Member

blattms commented Nov 25, 2025

@blattms: Would the file below work for the compilation test we discussed? I have verified that the code compiles with the following command

Please add a main method to make this compile. We need to be able to run this program.

Copy link
Member

@blattms blattms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be some kind of tests that makes sure that the solver runs at least. Currently, we will only know whether it compiles.

If you do C allocations, then you should probably check the results. I am not sure whether the memory is sufficient for arrays that are later realigned.

Is there are any code that checks that:

  • the blocks have the correct size
  • that the run is serial and not parallel
  • etc.

We need to prevent users from misusing this.
I might have missed something of course.


prec_t *prec_alloc()
{
prec_t *P = malloc(sizeof(prec_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A check seems missing here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now checking with assert.


bsr_matrix* bsr_alloc()
{
bsr_matrix *A=malloc(sizeof(bsr_matrix));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result of malloc should probably be checked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now checking with assert.


bslv_memory *bslv_alloc()
{
bslv_memory *mem = malloc(sizeof(bslv_memory));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check result?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now checking with assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good enough for OPM, because we never compile with -DNDEBUG to not make our simulator to fast 😅 One day that might change, though.

For "normal" projects I would use a check that even triggers in production. Asserts should be used to catch programming errors. This seems to be something that happens if system resources are exhausted.

{
int nrows;
int ncols;
int nnz;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using int for nnz limit us too much?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we start running models with more than 2 billion connections (~25M cells) then yes.
Let us just not lose sight of the fact that we are trying to get a fundamentally new feature into the simulator. Given that the current implementation only allows sequential runs, it is very unlikey that anyone will try to apply it to extrelemy large models.

@blattms
Copy link
Member

blattms commented Nov 25, 2025

Unfortunately, you will need to rebase this because of conflicts,

@blattms
Copy link
Member

blattms commented Nov 25, 2025

This module might get used by others. Please make sure that the new headers are installed by listing them in CMakeLists_files.cmake variable PUBLIC_HEADER_FILES

@blattms
Copy link
Member

blattms commented Nov 25, 2025

I am also getting some warnings:

/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:505:22: warning: operation on '((Opm::TpfaLinearizer<Opm::Properties::TTag::FlowWaterOnlyProblem>*)this)->Opm::TpfaLinearizer<Opm::Properties::TTag::FlowWaterOnlyProblem>::exportCount_' may be undefined [-Wsequence-point]
  505 |         exportCount_ = exportIndex_==idx ? ++exportCount_ : 0;
      |         ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/build/opm-simulators/opm/models/discretization/common/fvbaselinearizer.hh:316:27: warning: unused parameter 'idx' [-Wunused-parameter]
  316 |     void exportSystem(int idx, char *tag, const char *path="export")
      |                       ~~~~^~~      |                       ~~~~^~~
/build/opm-simulators/opm/models/discretization/common/fvbaselinearizer.hh:316:38: warning: unused parameter 'tag' [-Wunused-parameter]
  316 |     void exportSystem(int idx, char *tag, const char *path="export")
      |                                ~~~~~~^~~
/build/opm-simulators/opm/models/discretization/common/fvbaselinearizer.hh:316:55: warning: unused parameter 'path' [-Wunused-parameter]
  316 |     void exportSystem(int idx, char *tag, const char *path="export")
      |                                           ~~~~~~~~~~~~^~~~~~~~~~~~~
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:507:32: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
  507 |         sprintf(tag,"_%03d_%02d",exportIndex_, exportCount_);
      |                                ^
In member function 'void Opm::TpfaLinearizer<TypeTag>::exportSystem(int, char*, const char*) [with TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]',
    inlined from 'Opm::SimulatorReportSingle Opm::BlackoilModel<TypeTag>::nonlinearIterationNewton(int, const Opm::SimulatorTimerInterface&, NonlinearSolverType&) [with NonlinearSolverType = Opm::NonlinearSolver<Opm::Properties::TTag::FlowGasOilEnergyProblem, Opm::BlackoilModel<Opm::Properties::TTag::FlowGasOilEnergyProblem> >; TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]' at /build/opm-simulators/opm/simulators/flow/BlackoilModel_impl.hpp:300:57:
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:507:16: note: 'sprintf' output between 8 and 25 bytes into a destination of size 16
  507 |         sprintf(tag,"_%03d_%02d",exportIndex_, exportCount_);
      |         ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh: In member function 'Opm::SimulatorReportSingle Opm::BlackoilModel<TypeTag>::nonlinearIterationNewton(int, const Opm::SimulatorTimerInterface&, NonlinearSolverType&) [with NonlinearSolverType = Opm::NonlinearSolver<Opm::Properties::TTag::FlowGasOilEnergyProblem, Opm::BlackoilModel<Opm::Properties::TTag::FlowGasOilEnergyProblem> >; TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]':
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:527:29: warning: '%s' directive writing up to 15 bytes into a region of size between 1 and 256 [-Wformat-overflow=]
  527 |         sprintf(filename,"%s%s.f64",name,tag);
      |                             ^~
In member function 'void Opm::TpfaLinearizer<TypeTag>::exportVector(GlobalEqVector&, const char*, const char*) [with TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]',
    inlined from 'void Opm::TpfaLinearizer<TypeTag>::exportSystem(int, char*, const char*) [with TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]' at /build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:518:21,
    inlined from 'Opm::SimulatorReportSingle Opm::BlackoilModel<TypeTag>::nonlinearIterationNewton(int, const Opm::SimulatorTimerInterface&, NonlinearSolverType&) [with NonlinearSolverType = Opm::NonlinearSolver<Opm::Properties::TTag::FlowGasOilEnergyProblem, Opm::BlackoilModel<Opm::Properties::TTag::FlowGasOilEnergyProblem> >; TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]' at /build/opm-simulators/opm/simulators/flow/BlackoilModel_impl.hpp:300:57:
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:527:16: note: 'sprintf' output between 5 and 275 bytes into a destination of size 256
  527 |         sprintf(filename,"%s%s.f64",name,tag);
      |         ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I have not checked this and it might be false postives on architecture ppc64el. This is for a rebased version.

@blattms
Copy link
Member

blattms commented Nov 25, 2025

It seems like I can call the simulator without --matrix-add-well-contributions=true and do not get an error.

What does happen in this case?

Maybe we should enforce that option somehow?

_mm256_storeu_pd(xi,vz);

//double z[4];
//_mm256_store_pd(z,vz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some lines of dead code that look like they can be removed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to leave code snippets in because they are alternative ways of doing the something. It helps me when I go back to the code after a while. Nevertheless, I have removed it just for you. I have removed other commented out code as well.

for(int k=0;k<9;k++) C[k]=M[k];
}

void mat3_matfms(double *C, const double *A, const double *B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and other matrix helper functions are not documented

Maybe also extract utility functions as this one to a separate file as it is not really specific to the precs themselves?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not extract the utility functions into a separate file for the following reasons:

  1. scope creep: This was not a part of your original review and not essential for merging.
  2. scope: They are currently designed for inline use in the preconditioners, and I do not intend to support general use of them at this point.

Copy link
Member

@blattms blattms Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document with the mathematical expressions that the functions represent to help other people. Also for the other functions. This will save people a lot of time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added documentation of to the above function and other local functions.

@multitalentloes
Copy link
Member

multitalentloes commented Dec 2, 2025

Sorry for the delay.

Here is an updated list of things that should be in place before merging:

  • More documentation (I highlighted some functions, structs etc that must be documented, also would like use of compiler intrinsics to be explained so the code is more available to all developers)
  • use fmt for printing output, it is more robust and widely used in OPM, currently printf is used
  • OPM compiles fine on machines without AVX2 (has this been tested? Does it depend on Added CMake check for AVX2 opm-common#4840?)
  • clang-format must also be run before merging

With these resolved then we can get this preliminary version merged that should be expanded upon in the way discussed earlier in this PR.

@blattms
Copy link
Member

blattms commented Dec 3, 2025

@blattms: I added a constexpr guard to skip the mixed precision solver when single-precision vectors are used. The mixed-precision code block in FlexibleSolvers_impl.hpp now looks as follows:

#if HAVE_AVX2_EXTENSION
          } else if (solver_type == "mixed-bicgstab") {
              if constexpr (Opm::is_gpu_operator_v<Operator>) {
                OPM_THROW(std::invalid_argument, "mixed-bicgstab solver not supported for GPU operatorsg");
            } else if constexpr (std::is_same_v<typename VectorType::field_type, float>){
                OPM_THROW(std::invalid_argument, "mixed-bicgstab solver not supported for single precision.");
            } else {
                const std::string prec_type = prm.get<std::string>("preconditioner.type", "error");
                bool use_mixed_dilu= (prec_type=="mixed-dilu");
                using MatrixType = decltype(linearoperator_for_solver_->getmat());
                linsolver_ = std::make_shared<Dune::MixedSolver<VectorType,MatrixType>>(
                                                                            linearoperator_for_solver_->getmat(),
                                                                            tol,
                                                                            maxiter,
                                                                            use_mixed_dilu
                                                                        );
#endif

The code does the right thing when BUILD_FLOW_FLOAT_VARIANTS is OFF, but fails with UMFPACK related errors when it is ON.

/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh: In substitution of ‘template<class M> using Dune::Impl::UMFPackRangeType = typename Dune::Impl::UMFPackVectorChooser<M>::range_type [with M = Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > >]’:
/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh:274:11:   required from ‘class Dune::UMFPack<Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > > >’
  274 |     using range_type = Impl::UMFPackRangeType<M>;
      |           ^~~~~~~~~~
/home/khaugen/repos/opm/simulators/opm/simulators/wells/MultisegmentWellEquations.cpp:162:60:   required from ‘void Opm::MultisegmentWellEquations<Scalar, IndexTraits, numWellEq, numEq>::apply(const BVector&, BVector&) const [with Scalar = float; IndexTraits = Opm::BlackOilDefaultFluidSystemIndices; int numWellEq = 4; int numEq = 3; BVector = Dune::BlockVector<Dune::FieldVector<float, 3>, std::allocator<Dune::FieldVector<float, 3> > >]’
  162 |     const BVectorWell invDBx = mswellhelpers::applyUMFPack(*duneDSolver_, Bx);
      |                                                            ^~~~~~~~~~~~~
/home/khaugen/repos/opm/simulators/opm/simulators/wells/MultisegmentWellEquations.cpp:476:1:   required from here
  450 |     template class MultisegmentWellEquations<T,BlackOilDefaultFluidSystemIndices,numWellEq,numEq>;                               \
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh:181:29: error: invalid use of incomplete type ‘struct Dune::Impl::UMFPackVectorChooser<Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > >, void>’
  181 |     template<class M> using UMFPackRangeType = typename UMFPackVectorChooser<M>::range_type;
      |                             ^~~~~~~~~~~~~~~~
/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh:175:12: note: declaration of ‘struct Dune::Impl::UMFPackVectorChooser<Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > >, void>’
  175 |     struct UMFPackVectorChooser;
      |            ^~~~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/opmsimulators.dir/build.make:1972: CMakeFiles/opmsimulators.dir/opm/simulators/wells/MultisegmentWellEquations.cpp.o] Error 1
make[2]: Leaving directory '/home/khaugen/repos/opm/simulators/build'
make[1]: *** [CMakeFiles/Makefile2:716: CMakeFiles/opmsimulators.dir/all] Error 2
make[1]: Leaving directory '/home/khaugen/repos/opm/simulators/build'
make: *** [Makefile:149: all] Error 2

I've pushed my latest version to a new branch mixed-float. It is hard to see how my three lines of code is the culprit! Are there additional configuration variables I need to set?

Probably due (circular) header inclusion order. at the place the class needs to be defined to get its size, but it has only been forward declared, yet.

The UMFPackVectorChooser in dune-istl is only specialised for double and std::complex and not float and the template class chosen misses the typenames.

We need to work around this here. The question is why the multisegment wells are used with float...
Maybe there is more information in the compile log?

@blattms
Copy link
Member

blattms commented Dec 3, 2025

jenkins build this serial rocm hipify please

@blattms
Copy link
Member

blattms commented Dec 3, 2025

BTW on my system this PR compiles. I am checking in a chroot without avx support now.

@blattms
Copy link
Member

blattms commented Dec 3, 2025

The code does the right thing when BUILD_FLOW_FLOAT_VARIANTS is OFF, but fails with UMFPACK related errors when it is ON.

Maybe this problem is also in master?

@bska
Copy link
Member

bska commented Dec 3, 2025

The code does the right thing when BUILD_FLOW_FLOAT_VARIANTS is OFF, but fails with UMFPACK related errors when it is ON.

Maybe this problem is also in master?

For what it's worth, the CI system builds the master sources with BUILD_FLOW_FLOAT_VARIANTS set to ON.

@nrseman
Copy link
Author

nrseman commented Dec 3, 2025

The code does the right thing when BUILD_FLOW_FLOAT_VARIANTS is OFF, but fails with UMFPACK related errors when it is ON.

Maybe this problem is also in master?

For what it's worth, the CI system builds the master sources with BUILD_FLOW_FLOAT_VARIANTS set to ON.

Which implies it is not a problem in master. I just compiled my branch against the tip of the current master branch of opm-common where the avx-check logic is in place and I see the same issues. I'll push the changes to the mixed-pr branch so that we can let jenkins ci take a stab at it ...

@nrseman
Copy link
Author

nrseman commented Dec 3, 2025

Note that the latest version still compiles with BUiLD_FLOW_FLOAT_VARIANTS=OFF so it would be nice if someone can verify that it compiles on a cpu that does not support avx2 and confirm the expected runtime behavior.

@nrseman
Copy link
Author

nrseman commented Dec 3, 2025

BTW on my system this PR compiles. I am checking in a chroot without avx support now.

How did it go?

@blattms
Copy link
Member

blattms commented Dec 3, 2025

BTW on my system this PR compiles. I am checking in a chroot without avx support now.

How did it go?

Compilation works as expected.

@blattms
Copy link
Member

blattms commented Dec 3, 2025

jenkins build this serial rocm hipify please

@blattms
Copy link
Member

blattms commented Dec 4, 2025

The UMFPackVectorChooser in dune-istl is only specialised for double and std::complex and not float and the template class chosen misses the typenames.

We need to work around this here. The question is why the multisegment wells are used with float... Maybe there is more information in the compile log?

Neither I nor jenkins are seeing this. Hence this is not an issue for merging.

If your are still seeing this, then please open an issue and provide enough information to replicate

@blattms
Copy link
Member

blattms commented Dec 4, 2025

  • clang-format must also be run before merging

@multitalentloes This is not even possible on my system:

$ clang-format opm/simulators/linalg/mixed/*
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument
/home/mblatt/src/dune/opm/opm-simulators/.clang-format:17:9: error: unknown key 'InsertNewlineAtEOF'
        InsertNewlineAtEOF: true,
        ^~~~~~~~~~~~~~~~~~
Error reading /home/mblatt/src/dune/opm/opm-simulators/.clang-format: Invalid argument

for(int k=0;k<9;k++) invA[k]=M[k]/detA;
}

inline void vec_copy9(double *y, double const *x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides all this nitpicking above there is a serious issue. The code does not link because of inline functions when I compile with -O0:

[100%] Linking CXX executable bin/flow
/usr/bin/ld: lib/libopmsimulators.so.2026.04: undefined reference to `vec_copy9'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/flow.dir/build.make:233: bin/flow] Fehler 1
make[2]: *** [CMakeFiles/Makefile2:6973: CMakeFiles/flow.dir/all] Fehler 2
make[1]: *** [CMakeFiles/Makefile2:6980: CMakeFiles/flow.dir/rule] Fehler 2
make: *** [Makefile:3195: flow] error 2
Suggested change
inline void vec_copy9(double *y, double const *x)
void vec_copy9(double *y, double const *x)

or find another way to include the symbol. The inline is only a hint. Is it really needed? https://stackoverflow.com/questions/19068705/undefined-reference-when-calling-inline-function

The other inline functions here have no problem. That might be because they are actually not inlined?

Copy link
Author

@nrseman nrseman Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides all this nitpicking above there is a serious issue. The code does not link because of inline functions when I compile with -O0

So far, I have not been able to reproduced the issue on my end. Do you mind trying

static inline void vec_copy9(double *y, double const *x)

It should limit the scope to the file where the function is defined. I'll keep trying to reproduce the issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blattms: I've been able to reproduce the -O0 linking issue and verified that static inline fixes it. It will be among the changes I will push shortly.

@nrseman
Copy link
Author

nrseman commented Dec 4, 2025

@blattms, @multitalentloes:

I just pushed an update that take care of your remaining concerns, i.e.

  • documentation
  • printf->fmt::print
  • -O0 linking issue

My understanding is that the AVX2 compilation issue taken care of (right Markus?)

That leaves us with two items:

  • Documentation of avx2 intrinsics: I think this is out of scope. I have now documented what the functions using them do and there are better sources for learning about them than what can be provided in source code.
  • Run clang-format: This is technically possible, but really wrecks havoc on my code, making rebasing my development branch on top of it a nightmare. Unless this is a hard requirement, I'd like to skip it this time around. Instead, I can gravitate towards using it going forward.

So what do you think? Are we ready for merging?

@multitalentloes
Copy link
Member

Things are okay from my side now for this version of the code.

@multitalentloes
Copy link
Member

jenkins build this please

Copy link
Member

@blattms blattms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. I retested the debugging build. In it goes.

With a light protest because of misusage of assert, though 😅

@blattms blattms merged commit 6aaf6ff into OPM:master Dec 5, 2025
2 checks passed
@nrseman
Copy link
Author

nrseman commented Dec 5, 2025

With a light protest because of misusage of assert, though 😅

I'll move away from them going forward.

Thanks for all your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

manual:enhancement This is an enhancement/improvent that needs to be documented in the manual

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants